08. Activity Classification

Activity Classification Heading

Activity Classification

ND320 C4 L3 09 Activity Classifier

Activity Classification Summary

Now that we've explored the data, examined the literature, chosen our features, and pre-processed all the data. Now it's time to finally build the classifier!

First off, we will do feature extraction to train on 10 second long non-overlapping windows. And we used sklearn to build a random forest classifier to classify our data. Then we defined the hyperparameters with 100 trees where each tree has a maximum depth of 4.

Now we are ready to build and train the model!

But as we just trained on the whole dataset we can't easily evaluate it. But one way to evaluate the performance of a multi-class classifier is to look at a confusion matrix. The confusion matrix shows how many data points were misclassified and what they were misclassified as.

ND320 C4 L3 10 Leave-One-Subject-Out Cross Validation

Activity Classification Recap

Summary

We've explored the data, examined the literature, chosen our features, and pre-processed all the data. Now it's time to finally build the classifier!

In this lesson, we finally train our features to build a random forest model. We talk about model performance and use cross-validation to estimate our accuracy. We end up with a model with an overall classification accuracy of 73%, which is the percent of correct classifications made by the model. But don’t fret, we’ll do better in the next video!

Quiz

image quiz

Fruit Classifier

Fruit Classifier

quiz

Take a look at the confusion matrix for a fruit classifier above. What is the main classification error that the fruit classifier makes?

SOLUTION: Calling bananas oranges

Code from Video

Notebook Review

If you wanted to interact with the notebook in the video, you can access it here in the repo /activity-classifier/walkthroughs/activity-classifier/ or in the workspace below.

The dataset that will be used throughout this lesson can be found at the top of the lesson directory at /activity-classifier/data/.

Code

If you need a code on the https://github.com/udacity.

Activity Classification Further Research

Further Resources

Random forests are boosted decision tree models. You need to understand a decision tree before learning what a random forest model is. Start with the sklearn tutorial on decision trees. Then check out these videos on youtube for a visual explanation:

See this list of classification accuracy metrics that can be computed in sklearn.

Follow this series of blog posts for an understanding of how these accuracy metrics work on multiclass problems like ours.

Glossary

  • Cross-validation: A technique for estimating model performance where multiple models are trained and tested each on a separate partition of the entire dataset.
  • Classification accuracy: The percent of correct classifications made by a model.